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We investigate function estimation in nonparametric regression 
models with random design and heteroscedastic correlated noise. Adap- 
tive properties of warped wavelet nonlinear approximations are stud- 
ied over a wide range of Besov scales, / £ and for a variety of 
error measures. We consider error distributions with Long- Range- 
Dependence parameter q:,0 < a < 1; heteroscedasticity is modeled 
with a design dependent function a. We prescribe a tuning paradigm, 
under which warped wavelet estimation achieves partial or full adap- 
tivity results with the rates that are shown to be the minimax rates 
of convergence. For p > 2, it is seen that there are three rate phcises, 
namely the dense, sparse and long range dependence phase, depend- 
ing on the relative values of s,p,-n and ot. Furthermore, we show that 
long range dependence does not come into play for shape estimation 
f — J f. The theory is illustrated with some numerical examples. 

1. Introduction. 

1.1. Random design regression with LRD errors. Consider the random 
design regression model 

(1.1) Y, = f{Xi) + aiXi)£i, i = l,...,n, 

where Xj's are independent identically distributed (i.i.d.) random variables 
with a compactly supported density g, cr(-) is a deterministic function and 
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(£i)i>i is a stationary Gaussian sequence that is independent of the Xj's. The 
long range dependence (LRD) of the e^'s is described by a hnear structure 



oo 



(1-2) ei=^am'ni-m, ao = l, 

m=0 

where is an i.i.d. Gaussian sequence and Hmm-»oo Om"i^°^^^^^ = 

for a £ (0,1), 

(1.3) Var l^f^ e,^ ~c,n2-", 

where Ca is a finite and positive constant. In view of (1.3), the Hmit case 
a = l can be thought of as similar to weakly dependent errors case. 

1.2. Prologue: linear regression. Consider the regression model (1.1) with 
f{x) = a + bx and correlated errors (1.2). We refer to Chapter 9 of [1], where 
the case a{x) = 1 is treated. The least squares (LS) estimator of b is 

the asymptotic properties of 6 — 6 depend on Y^^=i^i^{^i)^i- Then, 
Y^T(^X,a{Xi)e}j = nE{Xl<j\Xi))E{el) 

(1.4) 

n 

+ (E(a(Xi)Xi))2^Cov(e,,e;)- 



In this setting, the LS estimator is -^m-consistent when the latter term is of 
order 0{n), which occurs if and only if 

(1.5) E(a(Xi)Xi) = 0. 

For (t(-) = 1, this is always true when E(Xi) = 0, and, if E(Xi) ^ 0, it is 
enough to center shift the design variables Xi — X, where X = ^J2^=iXi. 
When (t(-) ^ 1, condition (1.5) is not necessarily fulfilled, even if E(Xi) = 0. 
This is illustrated in the long range dependence literature. For example, 
[26] derived \/n-consistency of a generalized least squares estimator when 
a{-) = 1. Condition (1.5) appears in assumption 1 and Theorems 2.1 and 
2.2 of [13]. This example suggests that, even in a simple parametric setting, 
statistical properties of LS estimators depend on the behaviour of cr(-) with 
respect to the design distribution. For example, if the design is uniformly 
distributed, Xi ~ W[— 1, 1], then (1.5) is written as, 

(1.6) / a{u)udu = 0, 
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which holds for any even function cr. Note, however, that, in practice, a is 
not observable. 

1.3. Background: nonparametric regression. The model (1.2) with gen- 
eral error terms of the form a{Xt,et) was considered in [5]. Asymptotic 
properties of the Nadaraya- Watson kernel estimator are found in [8] , where 
£i's are assumed to be a functional of LRD Gaussian random variables; in [9], 
with £i as an infinite order moving average, and in [24], with the Xj's possibly 
LRD, not necessarily independent of the e^'s. Local linear estimation using 
kernel method was studied in [22] and [23] and in case of FARIMA-GARCH 
errors in [1] and [2]. The corresponding results for density estimation were 
obtained in [4, 6, 15] and [28]. 

A general message from these papers is that the limiting behaviour of non- 
parametric estimators depends on a delicate balance between the smoothing 
parameter (e.g., bandwidth) and the long memory parameter a. To be more 
specific, we quote the following result from [29], derived in (1.1) with a = 1: 

(1.7) Rn,2,giK,r) ^ '>^(2s/(2s+l),a) ^ 

where Rn^p^g{B^ j,) denotes the minimax weighted L^-risk over a Besov space 

(1.8) ii„,p,,(^^,,):=mf sup E||/-/||^,( ) 
with 

11/ - hUpig) = \f{x) - h{x)\Pg{x) dx^ 

We refer to Section 2.2 for the precise definition of Besov spaces in terms 
of wavelet coefficients. Here, s is related to the smoothness of the target 
function /, whereas vr and r are scale parameters. In (1.7) we see that there 
is an elbow in the rate of convergence and, hence, that the best possible rate 
depends on the relative value of s and a. For small values of a, LRD has 
a detrimental effect on the rates of convergence, whereas, for larger values 
of a, we obtain the same rate as if the errors were independent. This is 
of importance in the development of adaptive tuning procedures since, in 
practice, neither s nor a is known (note, however, that a can be estimated). 
While, for a = 1, different data-driven methods (e.g., cross-validation, plug- 
in) have been implemented for choosing the bandwidth (see, e.g., [30]), for 
a < 1, the effect of LRD may influence such procedures. We refer to [6] 
and [15] for detailed studies in the density case. We are not aware about 
such considerations in the random design regression setting, however, similar 
phenomena are anticipated. 
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Indeed, not many adaptive methods for curve estimation in the presence 
of long memory in errors are available. To the best of our knowledge, [12] is 
one of the few papers in this direction, where an orthogonal series estimator 
with adaptive stopping rule is shown to achieve the minimax rate, similar to 
that of (1.7), in the model (1.2) with a{-) = 1. In [12], it was also noticed that 
the rate of convergence for shape estimation /* = / — // does not involve 
a and is the same as if the errors were independent. This observation was 
later confirmed by the minimax results of [29]. 

1.4. Rates of convergence of wavelet estimators. In this paper, we study 
adaptive function estimation in the model (1.2), performances of estimators 
are given with respect to various L'p, p > 2, error measures. Introducing the 
maximal risk 

(1.9) Rn,p.g{fn,l3t,r)-= SUp E|| / - /„ ||^,. . , 

we consider nonlinear warped wavelet estimators of the form 

fn{x)= Pj,kH\M>^}i'j,k{G{x)), 
(i,fc)eAi 

where G{x) = g{u) du is the design distribution function and (^j,fc) is a 
wavelet family with enough regularity. We show the statistical parameters 
fjj^k and tuning parameters A, Ai may be constructed independently of s and 
to achieve near optimal results. Moreover, the tuning parameter A can be 
chosen independently of a as long as, for all j > 0, k, 

(1.10) E(^,-fc(G(Xi))a(Xi)) = 0. 

Note that, for a{-) = 1, the condition (1.10) is always satisfied, since wavelets 
are orthogonal to constants (Haar family included). We note the similarity 
between condition (1.5) in the parametric setting and condition (1.10) in 
the nonparametric scenario. 
Introducing rate exponents 

2s 2(s-(lA-l/p)) 

(1.11) OiD:=- — — -, as-- 



we will show that 



2s + 1' 2(s-l/7r) + l 

i^n,p,<;(/n,^;.)<Cn-P/2^(logn)^ 



where 



p — IT 

Q_D, if a > an and s > — - — (dense phase), 

27r 



(1.12) 7= <^ ^ ,1 ^ p-vr 

^ ' ' ' as, if a > as and — < s < — - — (sparse phase) 

vr 2tt 
a, if a < mm.[as,aD) (LRD phase). 
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and K > 0. This shows that convergence rates depends on the relative value 
of a with respect to s but also on the relative value of s with respect to p 
and TT. 

We show, also, that our rates are optimal (up to a log term) in the min- 
imax sense. Consequently, we generalize the result (1.7) to p > 2 and het- 
eroscedastic errors. In particular, for p = 2 the rates agree with Yang's op- 
timal rate, with a multiplicative log penalty, which is usual for adaptation. 
For p>2 our results show that there are two elbows and three phases in the 
convergence rates, namely the dense phase, the sparse phase and the long 
range dependence phase. This is illustrated in Figure 1. 

Furthermore, we show that, in the case of estimating the shape f — J f, 
there is no LRD phase, which agrees with the previous findings in [12] and 
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[29]. Finally, we will also show that, in the nonlinear wavelet estimator, we 
may replace G{-) with a corresponding empirical distribution function, and 
the resulting estimator still achieves the minimax rates. 

2. Preliminaries. 

2.1. Warped wavelets. Consider an orthonormal wavelet basis on the 
interval 1= [0,1], [(j)j^k{x),tpj,k{x)], where cp denotes the scaling function 
and tp denotes the wavelet. Here, j > 0, A; = 0, . . . , 2-^ — 1 and cj)j^k{x) = 
2J/2^(2Jx - k), ipj,k{x) = 2J/2?/;(2-'x - k). We refer to Chapter 7.5 of [21] 
for the construction of such a basis. For any function / G L'^[0, 1], we have 
the following representation: 

2J0-1 oo 2^-1 

(2.1) fix)= Yl oih,k(pjo,kix) + Y (^j'kiphkix), 

k=0 j=jo k=0 

where 

(2.2) I3,,k= C f{x)^j,k{x)dx 

Jo 

denotes the wavelet coefficients associated to /, with the obvious correspond- 
ing definition for the scaling coefficients Ojo.fc- '^^^ transformation (2.2) is 
called the wavelet transform (WT), and the representation (2.1) is called 
the inverse wavelet transform (IWT). In the case where / is observed on 
a regular grid i/n,i = 1,2, . . . ,n, both the WT and IWT can be computed 
in 0(n(logn)) steps using Mallat's pyramid algorithm. In the case where 
the function / is observed along a random grid, the implementation of the 
standard WT (2.2) and IWT (2.1) requires some extra care. 

A warped wavelet basis [19] is a modified wavelet basis representation 
specifically designed to handle random design regression model (1.2). The 
modification is suited to accommodate the design distribution function G(-). 
Provided that / o G ^^[0, 1], we have the following representation: 

2^0-1 oo 2J-1 

(2.3) fix) = Y a,,^k<PjoMG{x)) + E E Pj,ki^jAG{x)) 

k=0 j=jo k=0 

with 

(2.4) (3j,k = C f{x)g{x)^Pj,k{G{x)) dx, 

Jo 

where g{x) = G'{x), and aj^k is defined as in (2.4), with (p in place of tp. 
By analogy with the standard case, we will refer to (2.4) and (2.3) as the 
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warped wavelet transform (WWT) and the inverse warped wavelet transform 
(IWWT), respectively. Note that, by changing the variable in (2.4), 

(2.5) pj,k= f' f{G-\x))i;j,k{x)dx. 

Jo 

This shows that the WWT of / is equivalent to the WT of / o G^^. In the 
case where the function / is observed along a random sequence Xi with 
density g, the WWT and IWWT can be implemented in practice using 
a modification of Mallat's pyramid algorithm. This is further detailed in 
Section 5.1. 

2.2. Besov scales. Throughout this paper, we will assume that ip is a 
compactly supported wavelet with q,q> s, vanishing moments and tp £ 
(see Chapter 9 of [21]). We further assume that the corresponding wavelet 
basis (ipj^k) satisfies the Temlyakov property as stated in [18]. Typical ex- 
amples include the Daubechies wavelet family with q vanishing moments. 
Finally, we consider wavelet basis on the interval [0, 1] with appropriate 
boundary modifications (see [7]). In the light of (2.5), it is natural to ex- 
press smoothness condition, with respect to / o rather than /, as in 
[19]. We assume that foG-^ e B'^ .^{[0, 1]), where s > max{i, i}. The latter 
condition may be written as / € L^([0, 1]) and 

/oG-i = 5]/3,-fcVi,fce^;,(T) 

r/n 

< OO. 



j>0 0<A:<2J 

The parameter s can be thought of as being related to the number of deriva- 
tives of /. With different values of vr and r, the Besov spaces capture a vari- 
ety of smoothness features in a function, including spatially inhomogeneous 
behaviour. 

3. Minimax lower bounds over Besov balls. In this section, we construct 
minimax lower bounds for the minimax risk, given in (1.8), for both dense 
and sparse case. As mentioned in Section 1.3, for the L^-risk, homoscedastic 
errors and the dense case, the lower bound was obtained in [29]. 

To state our result, let us recall (1.11). 

Theorem 3.1. Consider the model (1-2) and assume that f o G^^ G 
B^j,. Furthermore, assume that inf^,. £7(2;) > 0. Then, as n— >oo, 



Cp(n-P°«/2 V n-P"/2), ifs> 



P — IT 



n J J TT 2 2tt 
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where Cp is a finite and positive constant. Furthermore, if A = {/:E[/(Xi)] 
0}, then 



The above theorem means that if / G A, then the lower bounds are exactly 
the same as in the case of i.i.d. random errors. If this is not the case, then 
the rates are influenced by long memory. Furthermore, we can see that the 
dependence between the predictors and errors have no influence as long as 
a{-) is bounded from below. Consequently, the theorem extends findings in 
[29] in several directions. First, it deals with p>2; second, it identifies the 
elbow in the sparse case; third, it allows dependence between errors and 
predictors. 

4. Upper bounds for wavelets estimators. 

4.1. Partial adaptivity. By partial adaptivity, we mean that our estima- 
tor does not depend on s, but G is known. Let 

Ai := {(j, fc), JO = -1 < J < ii, /c = 0, 1, . . . , 2^ - 1} 

be the set of resolution levels. Here, the lowest resolution level jo = — 1 
corresponds to scaling contributions at resolution level j = (i.e., V'-i.fc •= 
0o,fc and := ao,fc)- The fine resolution level ji is set to be 

(4.1) 2^1--^, 

logn 

which is a classical condition. In practice, for a sample size n, the maximal 
number of resolution levels is set to be 2-'^ ~ n/2; hence, condition (4.1) 
typically means that all resolution levels are used in (4.2). 

The partially adaptive wavelet estimator we are going to consider is 

(4.2) Ux)= 4fcII{|/3j-,fel>AH-,fc(G(x)), 

0',fe)6Ai 

1 

(4.3) /3,-fc:=-^^j-fc(G(X0)y^. 

1=1 

The theoretical level-dependent threshold parameter is set to be 

A = ToXnJ := To(A„ V A„j) 

(4.4) 

:=.„(i2|; V l{Efe,(G(X,))-(^,)l ^0}*i|^), 
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where tq is large enough. Note that, formany, the threshold depends on both 
j and k; however, from theoretical point of view, k is irrelevant. Furthermore, 
in simulation studies, we will average over all k to get threshold depending 
on j only. 

The following theorem gives the convergence rates for the nonlinear wavelet 
estimator (4.2) according to the LRD index a; recalling elbows location 
(1.12). 

Theorem 4.1. Let fn be the wavelet estimator (4.2) with (4-1), (4-3) 
and (4-4)- Assume that f o G ^([0, 1]), vr > 1, where s > max{^, 
and that a{-) is bounded. Then, 

E\\f-Ul,^^^<Cn-P/'H\ognr, 

where 

p — vr 

if a> ao and s > — — (dense phase), 

if a > as and — < s < (sparse phase), 

IT 2tt 

if a <mm{as,aD) (LRD phase), 

and K = p7 in the sparse and dense phase, k = \ in the LRD phase. If a = l, 
the LRD phase is not relevant. 

Remark 4.2. When a = 1, there is only one elbow on the convergence 
rate, provided that p> 2, switching from rate exponent (dense phase) 
to rate exponent as (sparse phase) . This is consistent with results obtained 
in the case of independent errors (see, e.g., [19]). 

Remark 4.3. For a <1 and p> 2, our rate results seems to be new, and 
we see that there is an additional elbow in the convergence rate switching 
from rate exponent ao or to a (LRD phase), depending on the relative 
value of s and a. This is illustrated in Figure 1. For p = 2, we note that 
there is only one elbow in the convergence rate, as we are either in the dense 
phase when a > ao or in the LRD phase when a <aD- This is consistent 
with resuhs of [12] and [29]. 

Remark 4.4. Note that if, for ah j > 0, A; = 0, . . . , 2-' - 1, the condition 
(1.10) holds, then the threshold (4.4) does not involve a (i.e., the estimator 
is constructed in the same way as if the errors were independent). The 
threshold (4.4) is then similar to the universal used in wavelet shrinkage 
(see, e.g., [10]). There is an additional multiplicative (logn)^/^ term, which 
is due to the martingale approximation of LRD sequences. 
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To gain some insight into condition (1.10), we note that, in the case of a 
uniform design distribution, this condition is written as, for all j, k 

J '^j,kiu)cr{u) du = 0, 

which typically holds if cr{u) is a polynomial function and ^ has enough 
vanishing moments. Typically, this condition does not hold if a has some 
irregularities (jumps, cusps) or if a is oscihating at medium and high fre- 
quencies. 

Remark 4.5. Our definition (4.1) of ji is the same as the definition used 
in standard (nonwarped) estimation with independent errors. We note that 
it is less restrictive than the definition used in the warped wavelet estimation 
setting of [19]. Because of such choice of j'l, the bias is of smaller order than 
the bias in [19]. Consequently, in the sparse phase we have the restriction 
s > l/vr, as compared to s > | + ^ in Proposition 2 of [19]. See also Remark 
4.9. 

Remark 4.6. A comparison of our results with rate results obtained 
under a regular grid design, [17, 20] and [27], shows that randomization of 
the design improves rate performances. We illustrate this using the fixed 
design rate exponents, but similar inequalities hold in the sparse region. In 
the fixed design scenario, the dense region rate exponent is as/{s + a/2), 
which is always smaller than the exponent min{(2s/(2s + !)),«} achievable 
under a random design. 

Remark 4.7. Using the weighted norm approximation of Theorem 4.1, 
we can conclude some results for the usual norm, even when g{xo) = for 
some xq £ [0, 1]. To see this, let A = {x £ [0, 1] :g{x) / 0} and assume that 
the Lebesgue measure of [0, 1] \ yl is zero. If, now, || • \\p = \\ ■ \\lp{i) is the 
usual Lp-norm, then, with 1/qi + 1/^2 = 1, Qi,Q2 > 1, Z G K, 

E||/-/n||^= / E|/-Ar= / E|/-/„r4 
^ Ja Ja 5 

< E|/ - urg'^'Y'^ {Ia^~""T''' 

by choosing Iqi = 1. Take, now, as in [19], g{x) = (a + l)x"', x G [0, 1]. Then, 
the latter integral is finite as long as a < (^2 — 1)^^- On the other hand, we 
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can apply Theorem 4.1 to conclude that in the dense and LRD phase the 
rates of convergence of E||/ — /n||p are the same as of E||/ — /n||^p(g), as 
long as 

Ml -71- ^ p(l + l/(g2-l)-7r) 

Note, however, that, if a < {q2 — 1)"^ < (2 + vr — p)/p, then 

_ 1 ^ p(l + l/fe-l)-7r) 
vr zvr 

so that (4.5) becomes void. Consequently, for any a < (2 + vr — p) /p, we can 
obtain the optimal rates. Of course, this approach does not work in the 
sparse case, because the resulting upper bound is not optimal (cf. Theorem 
2 of [19]). 

Furthermore, ifO<m<5<M< oo, then the norms || • ||p and || • ||LP(g) 
are equivalent. 

4.2. Full adaptivity. By full adaptivity, we mean that our estimator does 
not depend on s and G is unknown. In this case, the fine resolution level ji 
in (4.1) has to be modified thusly: 



(4.6) 2^''~\/r^- 

V logn 

In fact, in general (see Remark 7.6), we cannot use the same fine resolution 
level as in (4.1). 

Assume that we have 2n observations from the model (1.2) coded as 
follows: the first n observations are denoted by X[, . . . ,X'^, the remaining 
as Xi , . . . , Xn . The estimator that achieves the full adaptivity is 

(4.7) fn{x)= Yl Pj,kH\Pj,k\>Mi^jAGn{x)), 

(i,fc)eAi 

where, now, Gn is the empirical distribution function associated with X'l, . . . , X'^^ 
and 

1 " 

(4.8) /3j,fc:=-^^,-fe(G„(X,))y^. 

i=i 

Theorem 4.8. Consider the estimator (4.7) with (44), (4.6) and (4-8). 
Assume that f oG~^ G ,,([0, 1]) nLip;^/2; > 1, where s > max{i, and 
that (t(-) is bounded. Then, the rates of Theorem 4-1 remain valid with 

p — TT 

ao, if a > an and s > — — (dense phase), 

,11 y - , , , 

as, if a > as and — I- 7: < < — - — (sparse phase), 

TT 2 27r 

if o <mm(as,ar)) (LRD phase). 
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Remark 4.9. Note that, in the sparse phase, there is the additional 
restriction s > ^ + ^, as compared to Theorem 4.1. This is due to the larger 
bias, which, in turn, is due to choosing lower highest resolution level. 

4.3. Shape estimation. As first noticed in [12], the eff'ect of LRD is con- 
centrated on the zero Fourier frequency component of the target function / 
and corresponds to the scale // of /. Keeping this in mind, it is possible 
to avoid (or reduce) the curse of LRD by considering the estimation of the 
shape of the function: f — J f- Taking into account the design distribution 
in (2.4), we set 

fix) := fix) - r fiG~\y)) dy =: f{x) - cj^g- 
Jo 

Note that the wavelet coefficient of /* o G^^ is equal to Pj^k- We set 

(4.9) /::= Yl Pj,kH\kk\>^}i^j,k 

{i,fc)eAi,j^-i 

and /*, the corresponding fully adaptive estimator. The trick here is simply 
to remove the scaling coefficient. This is allowed, since f*{G~^{y)) dy = 0. 
In this way, there will be no LRD effect on the convergence rates. 

Theorem 4.10. Let f* be the wavelet estimator (4-9). Under assump- 
tions of Theorem 4-1, 

(4.10) E||r - fnWl < Gn-Pl^-^iloguY^ 

Under the assumptions of Theorem 4-8, the same bound is valid for f*. 

Note that f* {G"^ {y)) dy = E[f* (Xi)] =0. Therefore, by comparing 
(4.10) with the second part of Theorem 3.1, we see that /* is estimated 
(up to a log term) with the optimal rates. 

5. Finite sample properties. 

5.1. Implementation. In our simulation studies, we focus on LRD effect. 
For this purpose, we assume that C7(i) < C^(2) < • • • < U^^) denotes the or- 
dered design sample from the uniform distribution, and y(i), . . . , Y(„) the 
corresponding observations of Yi, not necessary ordered. If G„ is the empir- 
ical distribution function associated with [/(i) , . . . , ?7(„) , we have 

-| n 1 " 

(5-1) -EV'.,fc(G„(c/(.)))yco = -EV'.>(V^)>^w- 
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As noted in [3], in the case of a uniform design distribution, the ordered 
sample , . . . , ?7(„) may be used as a proxy for the regular grid ti = i/(n + 
1). Thus, in this case, (5.1) is computed by a simple application of Mallat's 
algorithm using the ^(i)'s as input variables. This algorithm is implemented 
in the wavethresh R-package with various thresholding options, from which 
it is straightforward to compute function and shape estimators. This is the 
software (appropriately modified) we have used in the examples below. 

Data-based threshold. As mentioned in Remark 4.4, if (1.10) holds, then 
the threshold is almost like in the usual fixed-design regression, with i.i.d. 
errors Tologn/y^; here, with the additional log penalty. The parameter tq 
is estimated by a standard deviation of wavelet coefficients on the finest res- 
olution level (option by . level=FALSE) or by computing standard deviation 
on each level separately (option by . level=TRUE). 

The LRD part of the threshold may be chosen in the following way. First, 
note that E[ipj^k{G{Xi))a{Xi)] is just the wavelet coefficient of cr(G'~^(-)). 
Therefore, we may perform a preliminary estimation and compute residuals, 
which serve as proxies for a{Xi)£i. From this, we can estimate a{-) and then 
the dependence index a. If a{-) is the estimator of cr{-), then we may apply 
DWT to a{G~^{i/n)). Extracting the resulting wavelet coefficients on level 
j, we obtained the estimates of E[ijjj^kiG{Xi))a{Xi)]. For a given j, the level 
dependent threshold is obtained as the average over = 0, . . . , 2-^ — 1. 

5.2. Examples. We generate 5^'s data according to (1.1) with Lidar, 
Bumps and Doppler target 

(5.2) fix) = {x{l - a:))'/'sm(2n-^^^, 

a uniform design distribution Xi = Ui ~i^[0, 1] and the following three f7(-) 
scenarios: (a) homoscedastic scenario with ct(x) = 0.1 (constant noise level); 

(b) heteroscedastic with a{x) = 0.1^J^{x + 0.5) (linear noise level); and 

(c) heteroscedastic with a{x) = 0.1(sin(7rx) — sign(x — 0.4)) (irregular noise 
level). For calibration and comparison purposes, we quote, for scenario (a) 
with the Doppler target, the signal-to-noise ratio (SNR) 

SNR=101ogio(^^^ -9.34 (dB). 

All other target function (Bumps and Lidar) were standardized to obtain the 
same SNR. Two different threshold parameters are considered, one given by 
(4.4) and the standard Donoho-Johnstone threshold. The noise level is es- 
timated either on each level (option by . level=TRUE) or globally (option 
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Table 1 

Monte Carlo approximations to MSE of function estimator of the Doppler target, 



with 1000 


replications 


of the model (1.1), in scenario (a) , (b) 


and (c) for 


some 


values 






of the 


dependence parameter d 










(a) 






(b) 




(c) 




d 


DJ thr 


LRD thr 


DJ thr 


LRD thr 


DJ thr 




LRD thr 


0.000 


0.0277 


0.0277 


0.0276 


0.0305 


0.0280 




0.0329 


0.150 


0.0276 


0.0276 


0.02745 


0.0288 


0.0279 




0.0319 


0.300 


0.0284 


0.0284 


0.0282 


0.0287 


0.0289 




0.0315 


0.325 


0.0280 


0.0280 


0.0278 


0.0281 


0.0284 




0.0316 


0.350 


0.0282 


0.0282 


0.0281 


0.0282 


0.0288 




0.0319 


0.375 


0.0299 


0.0299 


0.0297 


0.0299 


0.0306 




0.0335 


0.400 


0.0320 


0.0320 


0.0317 


0.0319 


0.0326 




0.0350 


0.425 


0.0350 


0.0350 


0.0347 


0.0347 


0.0358 




0.0383 


0.450 


0.0449 


0.0449 


0.0445 


0.0446 


0.0466 




0.0486 



by . level=FALSE). For such threshold values, we apply two threshold poli- 
cies, Hard and Soft. Finally, Daubechies DB(6) and DB(2) wavelets are con- 
sidered. For each of those scenarios we study the effect of the LRD parameter 
a on the performances of function estimator (4.2) and shape estimator (4.9) 
for sample sizes n = 1024. 

Monte Carlo results for Doppler and Bumps, with N = 1000 replications 
and Daubechies DB(6) wavelet are summarized in Tables 1 and 2 on page 
14. Notation DJ thr and LTD thr stands for Donoho- Johnstone universal 
threshold and the one given in (4.4), respectively. 

The mean square error MSE := ^ J2'i=i{f(.'^/i^) ~ fn{i/n))^ is plotted as a 
function of the dependence parameter as d = (1 — a)/2 G (0, 1/2) in Figure 
2. Here, d corresponds to the fractional integration parameter as required 
to simulate LRD noise using f racdif f R-package. 

Analysis of the results. 

1. Figure 2 describes MSE, for the homoscedastic scenario (a). We can ob- 
serve that the MSE seems to remain stable when the dependence is in 
the [0,0.35] range. Then, a sudden increase occurs after 0.35 suggesting 
that, for this simulated example, the LRD phase becomes active for very 
dependent error terms and confirming the detrimental effect of LRD in 
this region. This is also confirmed in Table 1. The similar effect is visible 
in the case of Bumps function, in Table 2. 

2. We compare Donoho- J onstone classical threshold with the one intro- 
duced in (4.4). Comparing left and right panels in Tables 1 and 2, we can 
see that there is completely no difference in case of the heteroscedastic 
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dependence 

Fig. 2. Monte Carlo approximation to MSE, n — 1024. Doppler function with cr(a;) = 0.1. 

noise. However, in case of the irregular noise [like scenario (c) above], the 
picture is not clear. In Doppler case the classical threshold performs bet- 
ter, on the other hand the level-dependent threshold (4.4) is preferable 
in case of Bumps target. This also applies to Lidar function. 
3. There is not too much difference between DB(2) and DB(6), as well as 
between Hard and Soft policy. However, the BY. LEVEL noise estimation 



Table 2 

Monte Carlo approximations to MSE of function estimator (4-S) of the Bumps target, 
with 1000 replications of the model (1-1), in scenario (a), (b) and (c) for some values 

of the dependence parameter d 



(a) 



(b) 



(c) 



d 


DJ thr 


LRD thr 


DJ thr 


LRD thr 


DJ thr 


LRD thr 


0.000 


0.1295 


0.1295 


0.1293 


0.1273 


0.1297 


0.1239 


0.150 


0.1298 


0.1298 


0.1297 


0.1288 


0.1301 


0.1256 


0.300 


0.1297 


0.1297 


0.1294 


0.1295 


0.1300 


0.1263 


0.325 


0.1301 


0.1301 


0.1297 


0.1296 


0.1308 


0.1281 


0.350 


0.1309 


0.1309 


0.1306 


0.1306 


0.1315 


0.1289 


0.375 


0.1328 


0.1328 


0.1324 


0.1324 


0.1334 


0.1308 


0.400 


0.1340 


0.1340 


0.1335 


0.1335 


0.1349 


0.1327 


0.425 


0.1377 


0.1377 


0.1372 


0.1372 


0.1389 


0.1367 


0.450 


0.1462 


0.1462 


0.1456 


0.1456 


0.1487 


0.1460 
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(i.e., estimation of tq = tqj) gives worse results in terms of MSE. The 
reason for this could be the following: variance estimator in case of LRD 
has slower rates of convergence then in the associated i.i.d. sequence. 
Consequently, on low frequencies (this is where LRD comes into play), 
the noise level estimates may not be very precise. The practical message 
is that, in LRD case, we should use the noise level estimates based on the 
highest resolution level. 

6. Proofs: lower bounds. To obtain the lower bounds, we follow closely 
the ideas of [25] . Let us first introduce some notation. Denote Y = (Yi, . . . , 1^)', 
e = (ei, . . . ,en)', 1 = (l, • • • , !)'> and, for any function /, let 

/(X):=(/(Xi),...,/(X„)) 

and /(X)/cj(X) and /(X) * (t(X) be the coordinatewise division and mul- 
tiplication, respectively, of two vectors. Furthermore, H is the covariance 
matrix of e. With a slight abuse of notation, let H = (■^iz)j,i=i,...,n and = 
{^~i^)i,i=i,...,n [of course, (^ji)-^ 7^^"^ in general]. 

For two functions /, /o, denote by A„(/o,/) the likelihood ratio 

Anifo, f) = dFyao)/dFYU), 

where Py(/) is the distribution of the process {Yi,i > 1} when / is true. 

Note that the model (1.2) can be written as Y' = /(X)' + (cr(X) * e)'. 
Then, we have, under Py(/), 

21nA„(/o,/) 

'Y-/(X)V„_i/Y-/(X) 



(6.1) 



ct(X) ; V a(X) 

In what follows, ttq and Ci, . . . , C4, Cp will be fixed and positive numbers. 

Sparse case. This is the case when the hardest function to estimate is 
represented by one term in the wavelet expansion only. In this case, we use 
the result of Korostelev and Tsybakov (see [16], Lemma 10.1). 



a(X) ; V a(X) 
Y-/o(X)V_i/Y-/o(X) 



a(X) J V <t(X) 
/o(X)-/(X)\' i/(/o(X)-/(X)) 



Lemma 6.1. Let V be a functional space, and let d{-,-) be a distance on 
V . Let V contain the functions /o, /i, • • • , fx, such that: 
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(a) difkJk') <6>0 fork = 0,l,...,K, k' , 

(b) K > exp(An) for some An > 0, 

(c) lnA„(/o,/fc) = Unk — Vnk, where Vnk are constants and Unk is a random 
variable such that for some ttq > we have Pf^{unk > 0) > tto, 

(d) SUPfcUnA; < A„. 

Then, for an arbitrary estimator fn, 

supP^a)(d(/„,/)>5/2)>vro/2. 

1 f 



fev 



To use this lemma, let us now choose V = {fjk : < A; < 2-' — 1}, where 
fjk{x) = (ij,k'4^j,k{G{x)) [i.e., fjk{G~^{u)) = Pj^kipj^k{u), /o = 0]. Since fo 
& B^j., we have Pj^k < A2~^'^ , where s' = s + ^ — ^. Furthermore, for 
any f,h£V, let 

d(/,/i) = ||/-/i||iP(,) 
be the weighted L^'-norm on V. Then, 

d{f,k,f,k')=Pj,k2'^'^'^'/''^mp=:S. 
Plugging-in /o = and / = fjk in (6.1), we obtain 



(6.2) 



Write 



2inA„(/o,/,.)^r^yH-r^U2f^ 



a(X) 



ct(X) 



a(X) 



In A„(/o, /) = {In A„(/o, fjk) + A„} - A^ =: Unk - Vnk- 



Note, also, that the first component (6.2) is nonnegative, since H (and so 
H~^) is positive definite. 

By the Cauchy-Schwarz inequality and Lemma 6.2 below, we obtain 



(6.3) E 



< < E 



/jfc(X) 
ct(X) 



a(X) 



1/2 



Therefore, by (6.2), (6.3), Chebyshev inequality and the aforementioned 
positivity of the component in (6.2), we obtain 

P{unk > 0) = P(ln A„(/o, /) > -A„) 

■l//ife(X)V_i//(X 



(6.4) 



> 1 - A-^E 



a(X) 



a(X) 



+ 



a(X) 



> 1 



A + V2A 

2 An, 



Now, I'Hl = E» I = Var(E^"=i e^) ~ c^n^"" via (1.3). Also, 



(l'H-^l)(l'Hl) 



: n 
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SO that 

(6.5) I'H-il -c-^n". 

Furthermore, 



^y/(2.') 

logn / 



a(X) J V a(X) 

n 

= E[ff,{X)/a\X)] Y: + {nfik{X)/a{X)]f ^ 

i=l i^l 

(6.6) 

<l|l/^llLll^i/3|,.trace(H-^) + 2-^-/3|,(l'H"il)||l/a||L 
= 0{n)l3lk + 0(2-^n")/32fc = ©(n)/?!,. 

Summarizing, we obtain that the nominator in (6.4) is bounded by Cinpij^. 
We now choose j, according to 

(6.7) 2^ = C2 
Then, 

1 logn 

^' ' - 2(. + l/2-l/vr) " - + ^ > 4(. + l/2-l/vr) = ^ 

Therefore, 

P(u„fe > 0) > 1 - 4CiC2"^''(s + 1/2 - l/vr) > TTo > 
by the appropriate choice of C2 in (6.7). Consequently, 
inf sup E||/ - . 

(6.8) >inf sup P{\\f - U\\LP[g)>5/2) 



-p/2as 



Dense case. Let be the vector with components ry^ = ±1, /c = 0, . . . , 2-' — 1. 
Let 7/* be the vector with components rj], = {—lY^^^^^rfk- Let fjrjix) = 7j x 
EfcLV r]kipj,k{G{x)). To have /j^oG"! G we must have 7^- < A2-J(^+V2). 

Note that /j^ - = ±ljipji. Now, plug-in / = fjn and /o = /j^i in (6.1) 
to get 

_.,.„„„,;,.,|(^).-V,,x,...,(4^).-. 
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As in (6.6), we have 

E^^J|lnA„(/o,/)|]<C3n7|<^o, 

if we choose 

with the appropriate C4. Now, as in [25], 

infmaxE/^J|/„ - fjr,\\ii(g) > C2^^'^lj, 

fn 

which, by Cauchy-Schwarz inequahty, yields 

(6.9) inf sup E||/-A||^ >Cpn-f/2°°. 

fn /Gi3=,,n{0} 

Therefore, via (6.9) and (6.8), we obtain the i.i.d. lower bounds in Theorem 
3.1. It finishes the proof in case of f € A. If / ^ A, then its mean has 

to be estimated. The lower bound follows in the very same way as on page 
645 of [29]. This finishes the proof of Theorem 3.1. 

Lemma 6.2. We have 

E[|e'H-V(X)p]<E[/(X)'H-V(X)]. 

Proof. Bearing in mind the symmetry of H, 
E[|e'H-V(X)p] 



:E 



-E[f{X)]Y^E[e,ea-,%l + {E[f{X)]}' ^ E\e,e.^^ri'^-j^ 

-E[f{x)]j2^7lj2^,,,^7^' + {E[f{x)]}' Y: ^^,T.^m^u' 

-E[f\X)]Y^ir^]{^-B-%,i + {E[f{X)]f 
:E[/2(X)]trace(H-i) + {E[/(X)]}2 ^ ^-^^ 

:E[/(X)'H-V(X)]. □ 
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7. Proofs: upper bounds. 

7.1. Decomposition of empirical wavelet coefficients. Here, we establish 
decomposition of the form, 

Pj,k ~ Pj,k = i-i-d. part + martingale part + wavelet LRD part. 

From (4.3), 

(7.1) E[4fc] =E[V^,,fc(Xi)/(Xi)] = / ^l^,,k{y)f{G-\y))dy = PJ,k■ 



We set Ui := G{Xi),i = 1, . . . ,n, the C/j's are uniformly distributed on [0,1], 
by independence 

E[V^,-fc([/i)a(Xi)ei] = E[V^,-fc(C/i)a(Xi)]E[ei] = 0, 

1 " 

Ti . 
1=1 

1 " 



- Y.i^j^kmf{Xi) - E[v^j, fe(c/i)/(Xi)]) 



(7.2) 



1=1 



1 " 



+ -y^^j,k{Ui)(j{X^)ei 



1=1 



=:Aq + Ai. 

Note that Aq is the sum of i.i.d. random variables, whereas the dependence 
structure is included in Ai only. The part Ai is decomposed further. Let 
= cri'ni,Xi,r]i_i,Xi_i, . . .). Let Si^i-i = £i - rji. Note that ei,j_i is Ti-i- 
measurable and {r]i,Xi) is independent of Thus, 

E[V;,- fc(C/,)a(Xi)e^|.F^_l] = ei,i„iE[Vj,fc(C/i)a(Xi)]. 

We write 

1 " 

Ai = -y2^j,k{Ui)a{Xi)ei 
n ^ 

Ti . 

(7.3) 

1 

+ -E[V;,-fc(C/i)c7(Xi)]^e,,i_i 
" i=i 

=:A2 + A^ 
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and 

Pj,k - Pj,k = A0 + A2 + A3 

(7.4) 

=: i.i.d. part + martingale part + wavelet LRD part. 

Consider, also, the following corresponding decomposition for the scaling 
coefficients aj^k- 

^j,k - 0.j,k = Bo + B2 + B3 

(7.5) 

=: i.i.d. part + martingale part + scaling LRD part. 

An important feature of this decomposition is that the LRD term involves 
the partial sums of ej,i_i only. Furthermore, if (1.10) holds, then ^3 = and 
the LRD part does not contribute. On the other hand, the scaling LRD part 
is always present. 

As for the shape estimation, let ^. be the scaling coefficient of /* o . 
Clearly, 

(^*j,k = (^j,k- j f{G~^{y))dy ( (l)j±{y)dy=:aj^k-Cf,GH(l)j,k{Ui)]- 
Jo Jo 

Let Cf^G be an estimator of Cf^c [e-g-, cj^g = ^ Sr=i /(^«)]- Then, we de- 
compose 

^*j,k - ^*j,k = Bo + B2 + B3 

= i.i.d. part + martingale part + Cf^G^4'j,kiUi) — c/^GE0j,fc(f^i) 

+ -E[0,-fc(C/i)(T(Xi)]5]e,,i_i - -E[<l),,kiUi)]Y.aiXi)ei. 

1=1 1=1 

If (7(-) = 1, then the last two terms equal 

-1 "1 "1 ^ 

-E[,/.,- ,.(C/i)] - -E[,/.,- ,.(C/i)] ^e, = — E[0,- fc(i7i)] 

Tl . Tl . ^ Tl . 

1=1 1=1 1=1 

which is the just sum of i.i.d. random variables. Consequently, if (1.10) 
holds, then the LRD effect is not present in the scaling coefficient estimation. 
Otherwise, the LRD part is present and affects convergence rates. Therefore, 
by removing the scaling coefficient (pofi^ we guarantee that LRD does not 
affect the shape estimation. 

7.2. Decomposition of the modified wavelet coefficients. In this section, 
we decompose Pj^k- Let us redefine 

= a{7]i, Xi,r]i_i, Xi^i, . . .) y a{X[, . . . , X'J. 
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Note that 

and tjjj^k{Gn{Xi))(T{Xi)ei is ^j-measurable. [This shows the importance of 
defining based on the first different of the sample, X[, . . . ,X'^.] Simi- 

larly to (7.2) and (7.3), we decompose 

1 " 

Pj,k - Pj,k = - Y.{ijj.k{Gn{X^))Yi - fc) 

i=l 
1 " 

= -Y.(^jAGn{Xi))f{X,)-P,,k) 

i=l 
1 " 

+ -Y.(^jMGn{Xi))a{Xi)ei 

Ti . ^ 

-E[^PjAGn{X,))a{X,)ei\T^.i]) 

1 " 
+ -E[V'j,fc(G„(Xi))a(Xi)]5]e,,i_i 

" i=i 
=:Ao + A2 + A3. 

7.3. Moment bounds. 

Lemma 7.1. For all j > and k = 0, ... ,2^ — I and p>2, 

(7.7) E[|/3,, - P.^kH = 0{n-P/^) + 0{2-^p/\-p^/^) 

as long as 2^ < n. The bound also applies to scaling coefficients \oij^k~ Oij,k\^ ■ 
Moreover, if (1.10) holds, then 



(7.6) 



E[|4-,fc-/3,,fcn=0(n-P/2). 
Proof. I.i.d. part. By using Rosenthal's inequality, [16], page 132, 

Y^{i^^AG{X,))f{X,) - E[V',-,fc(G(X,))/(X,)]) 
(T.8) 

< Cn-f||/||g„(n2^'(P/2-i) +nf/2) =0(n^P/2) 



EMor = -E 
n 



as long as 2^ <n. 

LRD part. If (1.10), then the LRD part vanishes. Otherwise, note that 

(7.9) E[i^,-,(c/i)a(Xi)n < Mump'^"'^-'^ ■ 
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Since is a centered normal random variable with variance 

(7.10) 7;2.= Var|^f^ei,i_ij , 
we obtain 

(7.11) E|^3|P = 0(2-^P/2n-P°/2)_ 

Martingale part. In the light of the decomposition (7.4), we see that 
nA2 =: J27=i is a martingale, where 

di = '4)j^k{Ui)(T{Xi)ei - 'E[i}j^k{Ui)(T{Xi)ei\Fi-i] 

= £i,,^i{^j,kma{X,) - E[Vj,fc(f/i)a(Xi)]) + 77iVj,fc(f/.)^(^0- 

Note that the first and the second term are uncorrelated, both uncondition- 
ally and conditionally on Ti-i. By (7.9), 

< 2f-i(E[|e,,,_ir]E[|V',-,fc(f/.)^(^*) -E[V'i,fc(f/i)^(^^)]r] 

< CE[|V,,fc(C/i)^T(Xi)n = C2^-(f/2-i). 

Now, 
(7.12) 

= E[Vj„ 

Using E[V'|fc(f/i)<7(Xi)]= 0(1), 

/ n \ P/2 

= E(^nE[^2,(;7i)a(Xi)]E772 + Var[V',,fc(C/i)a(Xi)]^4,_ij 

/ n \P/2 

< C,nP/2(E^|,(C/i))P/^ + C,{Y^Ti;,^k{Ul)r/'E[Y.<^^l) 
Using Rosenthal's inequality for martingales [14], page 25, 

/ n \ p/2 n 

E|^2r < Cn-PEK^E(d2| J-,_i) + C7n-P^E|(i,|f 

\i=l / i=l 

(7.13) 

< (:7(n-P/2 + n-Pn2J'(P/2-i)) < Cn-P/^ 
as long as 2^ < n. Now, (7.7) follows from (7.8), (7.11) and (7.13). □ 



E[^2^(C/i)a2(Xi)]E[7?2]+4^_,Var[^,,fc(C/i)a(Xi)]. 
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7.4. Large deviation estimates. 

Proposition 7.2. Let Xnj be as in (4-4)- Assume that j is such that 
2^ < (n/logn). For any r>0, there exist positive constants r and C{r,p,T) 
such that 

(7.14) > tA„,,/2) < C{r,p,T)n-'-P. 

A similar bound is valid for dj ^ — cxj^k- 

Proof. We obtain (7.14) separately for AqjA^ and A2 and apply trian- 
gular inequalities in (7.4) to complete the proof. A similar approach works 
for (7.5). 

Li.d. part. For Aq, we have from the Bernstein inequality as long as 2^ < 
(n/logn) (see, e.g., [19], Proposition 3) 



„ / , , , T log n\ f 3r log n 

(7.15) P |^o|>7tJ^^ <2exp' ^ 



2\ n )- 8||/||oomax{3,r} 
for all n. The bound in (7.14) is valid for the i.i.d. part with 

(7.16) r>max{|||/|Urp,yMmU}- 

LRD part. First, if (1.10) holds, then LRD part vanishes. Otherwise, we 
recall (7.9) and that J27=i^i,i-i a centered normal r.v. with variance 
(7.10). For sufficiently large n, 



„/, , , /logn / \ ^ / rn log ri2-' 

P |A3|>rW^/2 <C7exp 



Therefore, for all j such that 2^ > n^~'^, 



for all n, if 

(7.17) r>4d„rp||(7||^||V^||?. 
If, now, 2^ < n^^'^, then 

rnA 



P{\A3\>TXnj/2)=P{ 



i=l 



> ^ 



2\E[i;,k{Ui)a{Xi 



rloen 
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for the same choice of r as in (7.17). 

Martingale part. For A2, we wih use a new Bernstein's inequality for 
martingales. We recall the following lemma from [11]. 

Lemma 7.3. Let {di,J^i),i > 1, be a martingale difference sequence. De- 
note af = E[(i?|.Fj_i]. For any x,L,a> 0, 



i=l 



exp 



i=l 



i=l 



X 



2(L + ax/3)y 



We apply this lemma to our martingale sequence di and erf defined in 
(7.12), with a very precise choice of truncation levels a and L (clearly, they 
cannot be too big). Let 



Hn = Hn{a) :=5I^i^{M.I>«} 



1=1 



PI 



(7.18) 



i=l 



>x <P| 



i=l 



< 2 exp 



>X,Hn<Lj+P{Hn>L) 
+ P{Hn>L). 



2(L + anx/3) 



We take 



(7.19) L:=Ln = 2n(Alogn + E[^2fc(f/i)a2(Xi)]E[?7?]) =: 2n{A\ogn + Ci) 
with j4 > to be specified below, 

p{^a^, > = p(^Var[V',-,fc(C/i)a(Xi)]X:4-i 



(7.20) 



+ E[^2,(f/i)a2(Xi)]E[7?2]n>L/2j 
p(^Var[V',-,fc(f/i)a(Xi)] X^e^-i > ^"^og 



n 



> 



2 = 1 

Alogn 



<nP[elo> 
< Cnexp 



Var[^,-fc(C/i)a(Xi)] 
Alogn 



Alogn 



2Var[ei,o]Var[V',-,fc(C/i)a(Xi)] 
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= CiT'-v 

by the choice 

(7.21) A = 2{rp + 1) Var[ei,o] \&i[il>,,k{Ui)a{Xi)]. 
Further, note that 

Thus, for any a > (Alogn)^/^, 

<nP{dll{\a,\>a}>Alogn) 

< nP{dl > (Alogn) Va^) 

(7.22) < nP{dl > a^) = nP{dl > a^) 

<nP((V',-,fc(f/i)a(Xi) 

-E[^l;J,kiUl)a{Xl)]felo>ay2) 
+ nPivl^p]4Ui)a\Xi)>a^/2). 

Since 

\i;l,{U,)a\X,)\ < 2^||^||^||a||^ =: Co2^ < Coj^, 

we have 

P{vli^l,{U^)a\X,) > a'/2) < pLf > ^) 

(7.23) 

^--(-^)^--^^' 

by choosing a = By/n with 

(7.24) B = ACorp. 

A similar bound apphes to the first term in (7.22). 
Combining (7.18), (7.20) and (7.23), 



(7.25) pI- 



n 



i=l 



where L as in (7.19). Take, now, x = 5^^^) ^i^d note 
v?x^ ^ n(logn)^T^/8 



that 



2(L + anx/'i) nlogn(A + Ci) + Sr/6nlogn' 
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SO that (7.14) follows for the martingale part by taking 



r > max{y^8(A+ Ci)rp,Srp|}, 

where A, Ci and B were defined in (7.21), (7.19) and (7.24), respectively. 

□ 

7.5. Bounds for the modified wavelet coefficients. Let us start with the 
following bound: 

Lemma 7.4. For all 2^ < ^fn, we have 
(7.26) E[|V',,fc(G„(Xi))c7(Xi)n < 

where Cp is a constant depending only on p. 

Proof. Let @n{x) be a random element between Gn{x) and G{x). 
Then, 

E[|(^,,fe(G„(Xi)) - V,,fc(G(X,)))a(Xi)|P] 

<EE[|V^',(G(Xi))ne„(Xi)|P|a(Xi)nXi] 



<lkllLE 



sup I 



e„(x)rlE[|V'U(G(Xi))n=0(n-P/223^P/2-^-). 



In the above computation, we used independence of Gn{-) of Xi and the 
standard bound on the supremum norm of the empirical process. Conse- 
quently, 

E[|V^,, ,(G4Xi))ct(Xi)|P] < E[|V',,fc(G(Xi))cT(Xi)|f] + 0(n-P/223ip/2-i) 
and the bound is of order 2^^'^^'^"'^^ if and only if 2^ < y/n. □ 

With help of the above lemma, we conclude that the results for Pj^k can 
be rewritten for Pj^k- 

Lemma 7.5. Assume that ||/oG~"^||Lip(i/2) < c>o. The bounds of Lemma 
7.1 and Proposition 7.2 remain valid for (3j^k o,^d aj^k c-s long as 2^ < yjn. 

Proof. The bounds for the first part of the decomposition (7.6), ^Iq, 
follow from [19], Proposition 6. To deal with the LRD part, ^3, we simply 
replace (7.9) with (7.26) [see (7.11) and the computation leading to (7.17)]. 
Similarly, note that the moment bounds and large deviations for the mar- 
tingale part involve only the behavior of E[|'0j k{Gn{X.\))a{X\y^\ instead of 
E[|^,-fc(G(Xi))a(Xi)n. □ 
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7.6. Proof of Theorem 4-i- In what follows, Dj = {k, = 0, 1, . . . , 2-' — 1}, 
we split fn — f into three parts, 



E||/-/, 



< E 



p 

LP [9) 



+ E 



E E Pj,k^j,k{Gi-)) 

3=30 keDj 
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E E 



i=io k&Dj 



+ E E P3,ki^3,km-)) 

j>ji keDj 

:= linear term + nonlinear term + bias term. 



LP (9) 
LP{g)J 



Bias term. We use standard approximation results (see, e.g., [16], pages 
123-124), introducing 



(7.27) 



vr p 



,1 1 , 

s — max , , 

.vr p 



if p < vr, 5 = s and B'^^, C B;^,, if vr < 5 = s - (^ - i) and B^, C 5^ „ 

p 

LP{g) 



E E P3,k^3,k{G{-)) 

j>ji k€Dj 



(7.28) 



/' E E /3i,fc^,,fc(G'(x)) 
j>jikeDj 

/ E E Phk^hki^ 



5((x) (ix 



< C7||/ o 2"^^'P = 0{{\ogn/n)'P), 

p.r 

where we have used the definition (4.1) of ji for the last bound. 



The linear part. Applying Lemma 7.1, the term E|ajg^fc — ajo,fc|^ is pro- 
portional to Therefore, 



E 



E("jo,fc - Oijo,k)(t>jo,k{G{-)) 
k 



p 

LP {9) 
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JO 



Nonlinear term. We follow the proof of Theorem 5.1 in [18], incorporating 
our moments and large deviations bounds accordingly. We refer to Appendix 
for the definition Iq^^o spaces. We use Temlyakov's property and Minkowski's 
inequality repeatedly. 



E 



Y: /?,,.v,,.(g(-)) - E %^-)i{i/3,,,i>.„A„,}V'.,fc(G(-)) 



(i,fc)6Ai 
<2P-M E 



j.fceAi 



(j,fc)GAi 

+ E 



{|/3j,fcl>ToA„,,}V^J 



E ^J''=^{l4fc|<roA„,,} V'i,fc(G'(-)) 
(i,fc)eAi 



--■.A + B. 



Let us introduce some notation. We define j2 to be such that 2^^ = . 
Further, let 

A2 = {(i, fc),i2 < j < ii, = 0, 1, . . . , 2? - 1}, A3 = Ai \ A2. 

We start by the A-term. Changing variables u = G{x) we get 

\Pj,k - /^J,fel%|/3,,,-/3,,fc|>rA„,,/2}^j,fc(^) J 



^ < eII^ E l/^i.^-/'j-.^l%i/3,..-/3,,.|>rA„„/2}€fc(^)) ' 



'(i,fc)GAi 



</{ E [{m,k-f3j,k 



|2p 



■(j,fc)GAi 



^ P/2 

^l/2u/,../„MPl2/pl 



+ /{ E I{|/3,,.|>rA„,,/2}[E|4fc-/3,>P]^/^V'|,.(^)pci^ 



-(j,fc)eAi 
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Using the bounds of Lemma 7.1 and (A. 2) below, 



'(j,fc)eA2 

j2 



I? 

j=0 



+ C(^nYl IIV'i,fcllp E I{|/3,,fe|>roA„,,/2} 
j=j2 k£Dj 

A>0 j=lkeDj 

<Cn-P'^/'logn + C~X'^-'i\\f\\l^. 

In the second to last inequality, we used < and the fact that for j > j2 
we have Xnj = A„. 

As for Ai, we split this into 2 parts, according to A2 and A3. On A2, using 
Lemma 7.1 and Proposition 7.2, we get (recall that then A„j = A„) 

im,k - M'^^'Pilkk - M > roXn,,/2)f' = 0((c2PA2P)V2) = a^^. 
On A3, we have 

= 0(2-^>n-"P(logn)P/2), 

so that 

p 

(i,fc)eA3 



Ai <Cn-°f(lognf/2 ^ 2-^P\\y 
(i,fc)eA3 

(7.29) +cxlpj l^jAy'Wdu 



(j,fc)eA2 



< C7n-"P(logn)P/22-J2P/2 + (^7^2^ ^ 2^2^(^/2-1) ^ 



J=J2 
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For the B-term. 

B< l{ E I{|/3,.|>2roA„,,}^(l4fc-/3i,;i-l>^oA„,,V2)2/P/52,V'|fc(^^ 
^(i,fc)eAi ^ 

+ [{ J2 k\f^,.k\<^roK,,}Plk^'j,kiu)] du=:Bi+B2 

and both terms are treated in the similar way as Ai and A2, respectively. 
Summarizing, the upper bound for the nonlinear term is 

(7.30) 0(11/11^^ ^Ar^ + n-P'^/^logn). 

Rate results. The overall rate of convergence depends on the three main 
contributing terms, the bias term, the linear term and the nonlinear term, 

E||/-/n|li.(,)=0(A2^n + 0(n-^"/2) 

(7.31) 

+ 0{\\f\\lJ^r') + Oin-^''/'logn). 

The dense phase. This is the region where a > an, S = s and s > (p — 
7r)/27r. For a > ao, the linear term is negligible, since n~^"/^ = o(n~P"^/^). 
The bias term is negligible too since = (logri)^*^?!"*^ = o(n~P°^/^) for 
s> 1/2. For the nonlinear term we note that, for q = qo '■= 2.^+1 ' 

~XP^-1 = a2p-/(2^+1) = = n-^f/(2s+l)(i^g^)2sp/(2.+l)^ 

which is the convergence rate under the dense regime. To complete the proof, 
we apply the Besov embedding 1 of Theorem A.l, noting that, in the dense 
region, we always have tt > q^). 

The sparse phase. Here, a > as, 6 = s — {I/tt — 1/p) and s < (j> — tt)/2tt. 
For a > as, the linear term contribution is negligible since n"^"/^ = o(n~^"s/2)_ 
The bias term is negligible, too, since, for s > l/vr, we have A^*"^ = o{n~'''°'^^'^). 
For the nonlinear term we note that for q = qs = g^^i/^-i/n) have A^""^ = 
Apas = n~^°^/2(^iogn)2^'"s/2 -^^i^ich is the convergence rate under the sparse 
regime. To complete the proof, we apply the Besov embedding 3 of Theorem 
A.l, noting that, in the sparse region, we always have vr < q^. 

The LRD phase. This is the region where a < mm{as,aD)- In this case, 
we have, for s in the dense setting, n~^/2"D — Q^j^-p/2a^ and, for s in the 
sparse setting, 7i~p/2"s = o{n~P^'^°'). 
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7.7. Proof of Theorem 4-8. Let us write 
fn{x) - f{x) 



(i,fc)eAi 



+ [ hkH\Pj,k\>^oK,j}i^j,k{G{x))- Pj,ki^jMG{x))^ 



^{j,k)eA 



(j,fe)eAi 



<C||/oG-i||^, 2- 



-ji5p 



{j,k)eAi 

+ E ^3,,k{i^J,kiGnix))-^P,,kiGix))}■ 
(j,fc)eAi 

Now, replacing Lemma 7.1 and Proposition 7.2 with Lemma 7.5, we may 
proceed as in the proof of Theorem 4.1 to conclude that the second part of 
the above decomposition is bounded with the desired rate. The third part is 
clearly of the smaller order than the second one. Furthermore, for the bias 
term we have 

j>jikeDj ^"(9) 

= 0{{logn/nff/^). 

Note that we have a different bound than compared to (7.28), since, here, 
we stopped earlier (i.e., 2^^ ~ ^n/logn). Nevertheless, comparing the bias 
term with the rate in the dense phase, we see that, with the choice 5 = s, 
we have n"*^/^ < n"*^/^^*"''^-*, if s > 1/2. Furthermore, in the sparse phase, 
by choosing J = s — (l/vr — 1 /p) , we see that the bias is negligible as long as 
s>l/7r + l/2. 

Therefore, to finish the proof of Theorem 4.8, it suffices to bound the last 
part. We have, by using Holder inequality, 

E 



V 

LP (9) 



E Pj,k{^j,kiGnix)) - Vi,fc(G(x))} 

(j,fc)6Ai 

<E||G„-G||So E ll/3.M.(G(-))|li 
{i,fc)GAi 

= 0(n-P/2) E 2^'(3p/2-i)|/j^.,|P 
(i,fc)eAi 

= 0(n-P/2) J2 2J>2-J''5p('2^>(^+i/2-i/p)^|^^.^|P 
j<ji ^ k 
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P 



0(n-?'/22JiPa-5))||/oG~ij 

/ , / 77 \P(l-'5)/2\ 



V log 71 



= 0(max{77-*'/2, 77-^^'/2})||/ o G~Y.s ■ 

Up < TT, take 5 = s,so that B'^^^ C 5^ The above rate is then 0(^-^(2^+1)) 

for s > 1/2. If p > vr, take 5 = s — (l/vr — The above rate is max{77-P/^, 
^_(^_(i/^_i/p))p/2| -g s^j^iigj, ^j^j^^ ,^-pas/2 as long as s > 1/2 + l/vr. 

Remark 7.6. Let us consider 

^ = {fj,k = Pj,kipj,k,j > 1,A; = 0,...,2^ - 1}, 

where Pj^k = 2^^^^'^^^'^^^^'^\ and we assume that Pj^k are known. We re- 
cover the function fj^^ by using the estimator f3j^k'4^j^k{Gn{-))- Its expected 
weighted mean square loss, E|| • |||2 



(9)' 



IS 



|V'i,fc(G'n(2;)) - iljj^k{G{x))\'^g{x) dx 



By considering the first term in the Taylor expansion, the above expected 
value is of the order 



E 



23i 

77 

2^^' ^^ofv + k\ v + k\ ^ 

1 -r— dv. 



{'^'^^k{G{x)){Gn{x)-G{x))Yg{x)dx 
{i''j,k{u)Yu{l - u)du 

{il>'{2^u-k)}'^u{l-u)du 

Take /c = 2-^/^. Then, the above expression is of the order 2^^^^ /n. Now, if 
we choose j ~ jj^, then the expected weighted mean square loss is of the 
order 

(7.32) /32,2^V2^ ^ 2-2^'(^+^/2-i/-)2i/2^ ^ ^-2(.+i/4-i/.) ^ ^^^^^ 

-'' fi log 77 

Choose, for simplicity, vr = 1. Since also p = 2, there is no sparse phase 
and the only restriction (in the Theorem 4.1) in the dense phase is s > 
1. However, we note that the rate in (7.32) is of the smaller order than 
^-2^/(2^+1) -f Q^^y if s < 1 - |\/33 or s > I + ^VSS > 1. Consequently, 
we cannot stop the fully adaptive estimator at 2^'^ ~ and keep the same 
restriction on s, as in the case of the partially adaptive one. 
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APPENDIX: BESOV EMBEDDING IN Lq,oo SPACES 

We give a simplified version of Tlieorem 6.2 [18], when the dimension 
d = l and aj = 1. Let fi will denote the measure such that for j G N, A; G N, 

— II,/, . . IIP — 9j(P/2-l)|U/,ll 



(A.l) ^^{{J,k)} = ||V^,,fc||^ = 2^-(^/2-i)||V.||^ 



(A.2) 
and 



igoo :=supAV{(i,^) : |/3j,fc| > A} < oo 
A>0 



j,k j,k£Aj 

where Aj is a set of cardinality proportional to 

Theorem A.l. Let < p < oo,0 < s < oo be fixed and let qo = p/ 
{2s + I): 

1. If IT > qd, then for all r, <r < oo, B^^. C C lqjy,oo- 

2. If TT = qd, then for all r, < r < vr, B^ ^ C B^ ^ cIt^. Moreover for r > tt, 
we have: 

- Ifp = 2 then fij,,, C/^. 

- Ifp>2 then for all r>p, B^.,. C B^^^ C Ir- 

3. // 2/(2s + 1) <7T < QD, for all <r <oo, B'^,, C B'^^^ C Iqg^oo, where 

_ p/2-l 

~ s+(l/2-l/7r) • 

REFERENCES 

[1] Beran, J. and Feng, Y. (2001). Local polynomial estimation with a FARIMA- 

GARCH error process. Bernoulli 7 733-750. MR1867080 
[2] Beran, J. and Feng, Y. (2002). Local polynomial fitting with long-memory, 

short-memory and antipersistent errors. Ann. Inst. Statist. Math. 54 291-311. 

MR1910174 

[3] Cai, T. and Brown, L. D. (1999). Wavelet estimation for samples with random 

uniform design. Statist. Probab. Lett. 42 313-321. MR1688134 
[4] Cheng, B. and Robinson, P. M. (1991). Density estimation in strongly dependent 

nonlinear time series. Statist. Sinica 1 335-359. MR1130123 
[5] Cheng, B. and Robinson, P. M. (1994). Semiparametric estimation from time series 

with long-range dependence. J. Econometrics 64 335-353. MR1310526 
[6] Claeskens, G. and Hall, P. (2002). Effect of dependence on stochastic measures 

of accuracy of density estimators. Ann. Statist. 30 431-454. MR1902894 



WAVELET REGRESSION WITH DEPENDENT ERRORS 



35 



[7] Coi-IEN, A., Daubechies, I. and Vial, P. (1993). Wavelets on the interval and fast 

wavelet transforms. Appl. Comput. Harmon. Anal. 1 54-81. MR1256527 
[8] CSORGO, S. and Mielniczuk, J. (1999). Random-design regression under long-range 

dependent errors. Bernoulli 5 209-224. MR1681695 
[9] CsORGO, S. and Mielniczuk, J. (2000). The smoothing dichotomy in random-design 

regression with long-memory errors based on moving averages. Statist. Simca 10 

771-787. MR1787779 

[10] DONOHO, D., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996). Den- 
sity estimation by wavelet thresholding. Ann. Statist. 24 508-539. MR1394974 

[11] Dzhaparidze, K. and van Zanten, J. H. (2001). On Bernstein-type inequalities for 
martingales. Stochastic Process. Appl. 93 109-117. MR1819486 

[12] Efromovich, S. (1999). How to overcome the curse of long-memory errors. IEEE 
Trans. Inform. Theory 45 1735-1741. MR1699909 

[13] Guo, H. and KOUL, H. L. (2008). Asymptotic inference in some hetereoscedastic 
regression models with long memory design and errors. Ann. Statist. 36 458- 
487. MR2387980 

[14] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. 
Academic Press, New York. MR0624435 

[15] Hall, P., Lahiri, S. N. and Truong, Y. K. (1995). On bandwidth choice for density 
estimation with dependent data. Ann. Statist. 23 2241-2263. MR1389873 

[16] Hardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1997). 

Wavelets, Approximation, and Statistical Applications. Lecture Notes m Statis- 
tics 129. Springer, New York. MR1618204 

[17] Johnstone, I. M. (1999). Wavelet shrinkage for correlated data and inverse prob- 
lems: Adaptivity results. Statist. Sinica 9 51-83. MR1678881 

[18] Kerkyacharian, G. and Picard, D. (2000). Thresholding algorithms, maxisets and 
well-concentrated bases. Test 9 283-344. MR1821645 

[19] Kerkyacharian, G. and Picard, D. (2004). Regression in random design and 
warped wavelets. Bernoulli 10 1053-1105. MR2108043 

[20] KuLiK, R. and Raimondo, M. (2009). L^ wavelet regression with correlated errors. 
Statist. Sinica. To appear. 

[21] Mallat, S. (1998). A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, 
San Diego, CA. MR1614527 

[22] Masry, E. (2001). Local linear regression estimation under long-range dependence: 
Strong consistency and rates. IEEE Trans. Inform. Theory 47 2863-2875. 
MR1872846 

[23] Masry, E. and Mielniczuk, J. (1999). Local linear regression estimation for 
time series with long-range dependence. Stochastic Process. Appl. 82 173-193. 
MR1700004 

[24] Mielniczuk, J. and Wu, W. B. (2004). On random-design model with dependent 
errors. Statist. Sinica 14 1105-1126. MR2126343 

[25] Pensky, M. and Sapatinas, T. (2009). Functional deconvolution in a periodic set- 
ting: Uniform case. Ann. Statist. 37 73-104. MR2488345 

[26] Robinson, P. M. and Hidalgo, F. J. (1997). Time series regression with long-range 
dependence. Ann. Statist. 25 77-104. MR1429918 

[27] Wang, Y. (1996). Function estimation via wavelet shrinkage for long-memory data. 
Ann. Statist. 24 466-484. MR1394972 

[28] Wu, W. B. and Mielniczuk, J. (2002). Kernel density estimation for linear pro- 
cesses. Ann. Statist. 30 1441-1459. MR1936325 



36 



R. KULIK AND M. RAIMONDO 



[29] Yang, Y. (2001). Nonparametric regression with dependent errors. Bernoulli 7 633- 
655. MR1849372 

[30] Yao, Q. and TONG, H. (1998). Cross- validatory bandwidth selections for regression 
estimation based on dependent data. J. Statist. Plann. Inference 68 387-415. 
MR1629607 



Department of Mathematics 

AND Statistics 
University of Ottawa 
585 King Edward Avenue 
Ottawa ON KIN 6N5 
Canada 

E-MAIL: rkulik@uottawa.ca 



